Name: Jingcheng Jiang
I used two datasets related to US Covid-19 cases statistics, which are:
Source: 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv' from New York Times.
Source: 'https://query.data.world/s/sc4gq2roysjsytksfhvhkoybk5xm2j' from Johns Hopkins.
# Import modules
import pandas as pd
import plotly.express as px
from urllib.request import urlopen
import json
# Read in datasets
states_url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv'
states_df = pd.read_csv(states_url)
counties_df = pd.read_csv('https://query.data.world/s/sc4gq2roysjsytksfhvhkoybk5xm2j')
counties_df = counties_df.dropna(how='any',axis=0)
In order to identify the location in the following plot, we need to add an attribute - the abbreviation of each state to the dataset.
# Use the state code file for reference and use for loop to add the attribute of state_code
statenames = pd.read_csv("state_code.csv")
states_df['state_code'] = states_df['state']
for i in range(len(statenames)):
states_df['state_code'].replace(statenames.loc[i]['State'],
statenames.loc[i]['Code'],inplace = True)
Take a look at the two datasets used.
states_df
counties_df
Since the State-based dataset is gathered by date, I decide to make a scatter mapplot with interactive timeline.
By dragging the timelineine in the visualization, users can clearly see the growth of cases in each State, from January 21 to April 24, 2020.
In addition, by pointing the mouse on each State on the mappolt, users can see tags including state name, number of confirmed cases, number of deaths, etc.
The brighter the color, the larger the area of bubbles, indicating more cases confirmed in the State.
fig = px.scatter_geo(states_df, locations = 'state_code',
locationmode = 'USA-states', color = 'cases',
color_continuous_scale = px.colors.sequential.Agsunset,
hover_name = 'state', size = 'cases',size_max = 80,
hover_data = ['deaths'], scope = 'usa',
title = 'USA Covid-19 Cases_States Based',
animation_frame = 'date')
fig.show()
In the following part, we are going to explore the real-time data for each county.
In order to locate each county in the dataset on the US map, we need to use fips code which is the unique identification for each county in US.
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
counties = json.load(response)
After read in the fips code, we can make our second visualization of county based Covid-19 cases.
Although some county data is temporarily missing, we can see the epidemic situation of each county by the shade of the color.
The darker the color, the more cases confirmed. Similarly, when the user puts the mouse over each county, they can see the label containing the name of the County-State, the number of confirmed cases, and the number of deaths.
fig = px.choropleth_mapbox(counties_df,geojson=counties,locations='fips_code', color='confirmed',
color_continuous_scale=px.colors.sequential.Blues,
range_color=[0, 200],
mapbox_style="carto-positron",
zoom=3,
center = {"lat": 37, "lon": -95},
hover_name = 'county_name_long',
hover_data = ['deaths'],
title = 'USA Covid-19 Cases_County Based')
fig.update_layout(coloraxis_colorbar =
dict(tickvals=[0,50,100,150,200],
ticktext = ['0','50','100','150','>200']))
fig.show()
The following visualization shows the Confirmed Cases and Deaths Cases in Each County
In the scatter plot, users can check the number of confirmed cases, deaths cases and urbanization degree of each county by pointing the mouse on each county.
From this visualization, we can find that the counties with large numbers of confirmed cases and deaths cases are either tagged as "Large Central Metro" or "Large Fringe Metro".
Note: To make the visualization clearer, I exclude the counties which have more than 40000 confirmed cases.
new_counties_df = counties_df[counties_df["confirmed"]<40000]
fig = px.scatter(new_counties_df, x="confirmed", y="deaths",
hover_name = 'county_name_long',
hover_data = ['NCHS_urbanization'],
title = "Confirmed Cases and Deaths Cases in each Counties")
fig.show()
Finally, I made a bar chart indicating the "Total Confirmed Cases of each Urbanization Degree of US Counties".
# Get the grouped data
total_confirmed = counties_df.groupby("NCHS_urbanization")["confirmed"].sum().reset_index()
total_confirmed
From the bar chart below, we can find that the relationship between the number of total confirmed cases and the degree of urbanization. With higher urbanization degree, the worse the epidemic situation seems.
fig = px.bar(total_confirmed,
x='NCHS_urbanization',
y='confirmed',
color = 'confirmed',
title = "Total Confirmed Cases of each Urbanization Degree of US Counties")
fig.show()
The U.S. epidemic expanded rapidly in March 2020, East and West Coast states have more severe epidemics situations. As of April 24, 2020, the severity of epidemic outbreaks in New York and New Jersey ranked first and second places.
Counties with large numbers of confirmed cases and deaths cases are labeled as either "Large Central Metro" or "Large Fringe Metro".
The total number of confirmed cases in counties labeled as "Large Central Metro" and "Large Fringe Metro" is much higher than in counties with other urbanization levels, the number of confirmed cases of these counties accounts for 80% of the total in the US.
Thanks for reading! If you have any questions, please feel free to contact me by email at jj21@illinois.edu.